用于分类任务的机器学习算法的最终性能通常根据基于测试数据集的经验误差概率(或准确性)来衡量。然而,这些算法通过基于训练集的典型不同 - 更方便的损耗功能而优化了这些算法。对于分类任务,这种损失函数通常是负值损耗,导致众所周知的交叉熵风险,这通常比误差概率更好地表现出(从数值角度)。关于泛化误差的常规研究通常不会考虑训练和测试阶段的损失之间的潜在不匹配。在这项工作中,考虑到基于精度度量和负对数损耗的训练,基于概括的Pock-Wise Pac方法的分析。我们标记此分析Pacman。建立所提到的不匹配可以写成似然比,浓度不平等可以用于根据一些有意义的信息理论量的一些点智选一的界限提供一些关于泛化问题的见解。还提供了对所得界限的分析和与文献中的可用结果进行比较。
translated by 谷歌翻译
过度装备数据是与生成模型的众所周知的现象,其模拟太紧密(或准确)的特定数据实例,因此可能无法可靠地预测未来的观察。在实践中,这种行为是由各种 - 有时启发式的 - 正则化技术控制,这是通过将上限发展到泛化误差的激励。在这项工作中,我们研究依赖于在跨熵损失的随机编码上依赖于随机编码的泛化误差,这通常用于深度学习进行分类问题。我们导出界定误差,示出存在根据编码分布随机生成的输入特征和潜在空间中的相应表示之间的相互信息界定的制度。我们的界限提供了对所谓的各种变分类分类中的概括的信息理解,其由Kullback-Leibler(KL)发散项进行规则化。这些结果为变分推理方法提供了高度流行的KL术语的理论理由,这些方法已经认识到作为正则化罚款有效行动。我们进一步观察了具有良好研究概念的连接,例如变形自动化器,信息丢失,信息瓶颈和Boltzmann机器。最后,我们对Mnist和CiFar数据集进行了数值实验,并表明相互信息确实高度代表了泛化误差的行为。
translated by 谷歌翻译
本文在对数损耗保真度下调查了多终端源编码问题,这不一定导致添加性失真度量。该问题是通过信息瓶颈方法的扩展到多源场景的激励,其中多个编码器必须构建其来源的协同速率限制描述,以便最大化关于其他未观察的(隐藏的)源的信息。更确切地说,我们研究所谓的基本信息 - 理论极限:(i)双向协同信息瓶颈(TW-CIB)和(ii)协同分布式信息瓶颈(CDIB)问题。 TW-CIB问题由两个遥远的编码器分开观察边缘(依赖)组件$ X_1 $和$ X_2 $,并且可以通过有关隐藏变量的信息提取信息的目的进行有限信息的多个交换机(Y_1,Y_2)$ ,它可以任意依赖于$(X_1,X_2)$。另一方面,在CDIB中,有两个合作的编码器,分别观察$ x_1 $和$ x_2 $和第三个节点,它可以侦听两个编码器之间的交换,以便获取有关隐藏变量$ y $的信息。根据标准化(每个样本)多字母互信息度量(对数损耗保真度)来测量的相关性(图 - 优点),并且通过限制描述的复杂性来产生一个有趣的权衡,从而测量编码器和解码器之间的交换所需的费率。内部和外界与这些问题的复杂性相关区域的衍生自特征从哪个感兴趣的案例的特征在于。我们所产生的理论复杂性相关区域最终针对二进制对称和高斯统计模型进行评估。
translated by 谷歌翻译
In intensively managed forests in Europe, where forests are divided into stands of small size and may show heterogeneity within stands, a high spatial resolution (10 - 20 meters) is arguably needed to capture the differences in canopy height. In this work, we developed a deep learning model based on multi-stream remote sensing measurements to create a high-resolution canopy height map over the "Landes de Gascogne" forest in France, a large maritime pine plantation of 13,000 km$^2$ with flat terrain and intensive management. This area is characterized by even-aged and mono-specific stands, of a typical length of a few hundred meters, harvested every 35 to 50 years. Our deep learning U-Net model uses multi-band images from Sentinel-1 and Sentinel-2 with composite time averages as input to predict tree height derived from GEDI waveforms. The evaluation is performed with external validation data from forest inventory plots and a stereo 3D reconstruction model based on Skysat imagery available at specific locations. We trained seven different U-net models based on a combination of Sentinel-1 and Sentinel-2 bands to evaluate the importance of each instrument in the dominant height retrieval. The model outputs allow us to generate a 10 m resolution canopy height map of the whole "Landes de Gascogne" forest area for 2020 with a mean absolute error of 2.02 m on the Test dataset. The best predictions were obtained using all available satellite layers from Sentinel-1 and Sentinel-2 but using only one satellite source also provided good predictions. For all validation datasets in coniferous forests, our model showed better metrics than previous canopy height models available in the same region.
translated by 谷歌翻译
By transferring knowledge from large, diverse, task-agnostic datasets, modern machine learning models can solve specific downstream tasks either zero-shot or with small task-specific datasets to a high level of performance. While this capability has been demonstrated in other fields such as computer vision, natural language processing or speech recognition, it remains to be shown in robotics, where the generalization capabilities of the models are particularly critical due to the difficulty of collecting real-world robotic data. We argue that one of the keys to the success of such general robotic models lies with open-ended task-agnostic training, combined with high-capacity architectures that can absorb all of the diverse, robotic data. In this paper, we present a model class, dubbed Robotics Transformer, that exhibits promising scalable model properties. We verify our conclusions in a study of different model classes and their ability to generalize as a function of the data size, model size, and data diversity based on a large-scale data collection on real robots performing real-world tasks. The project's website and videos can be found at robotics-transformer.github.io
translated by 谷歌翻译
Simulating quantum channels is a fundamental primitive in quantum computing, since quantum channels define general (trace-preserving) quantum operations. An arbitrary quantum channel cannot be exactly simulated using a finite-dimensional programmable quantum processor, making it important to develop optimal approximate simulation techniques. In this paper, we study the challenging setting in which the channel to be simulated varies adversarially with time. We propose the use of matrix exponentiated gradient descent (MEGD), an online convex optimization method, and analytically show that it achieves a sublinear regret in time. Through experiments, we validate the main results for time-varying dephasing channels using a programmable generalized teleportation processor.
translated by 谷歌翻译
Electricity prices in liberalized markets are determined by the supply and demand for electric power, which are in turn driven by various external influences that vary strongly in time. In perfect competition, the merit order principle describes that dispatchable power plants enter the market in the order of their marginal costs to meet the residual load, i.e. the difference of load and renewable generation. Many market models implement this principle to predict electricity prices but typically require certain assumptions and simplifications. In this article, we present an explainable machine learning model for the prices on the German day-ahead market, which substantially outperforms a benchmark model based on the merit order principle. Our model is designed for the ex-post analysis of prices and thus builds on various external features. Using Shapley Additive exPlanation (SHAP) values, we can disentangle the role of the different features and quantify their importance from empiric data. Load, wind and solar generation are most important, as expected, but wind power appears to affect prices stronger than solar power does. Fuel prices also rank highly and show nontrivial dependencies, including strong interactions with other features revealed by a SHAP interaction analysis. Large generation ramps are correlated with high prices, again with strong feature interactions, due to the limited flexibility of nuclear and lignite plants. Our results further contribute to model development by providing quantitative insights directly from data.
translated by 谷歌翻译
Despite the impact of psychiatric disorders on clinical health, early-stage diagnosis remains a challenge. Machine learning studies have shown that classifiers tend to be overly narrow in the diagnosis prediction task. The overlap between conditions leads to high heterogeneity among participants that is not adequately captured by classification models. To address this issue, normative approaches have surged as an alternative method. By using a generative model to learn the distribution of healthy brain data patterns, we can identify the presence of pathologies as deviations or outliers from the distribution learned by the model. In particular, deep generative models showed great results as normative models to identify neurological lesions in the brain. However, unlike most neurological lesions, psychiatric disorders present subtle changes widespread in several brain regions, making these alterations challenging to identify. In this work, we evaluate the performance of transformer-based normative models to detect subtle brain changes expressed in adolescents and young adults. We trained our model on 3D MRI scans of neurotypical individuals (N=1,765). Then, we obtained the likelihood of neurotypical controls and psychiatric patients with early-stage schizophrenia from an independent dataset (N=93) from the Human Connectome Project. Using the predicted likelihood of the scans as a proxy for a normative score, we obtained an AUROC of 0.82 when assessing the difference between controls and individuals with early-stage schizophrenia. Our approach surpassed recent normative methods based on brain age and Gaussian Process, showing the promising use of deep generative models to help in individualised analyses.
translated by 谷歌翻译
We examined multiple deep neural network (DNN) architectures for suitability in predicting neurotransmitter concentrations from labeled in vitro fast scan cyclic voltammetry (FSCV) data collected on carbon fiber electrodes. Suitability is determined by the predictive performance in the "out-of-probe" case, the response to artificially induced electrical noise, and the ability to predict when the model will be errant for a given probe. This work extends prior comparisons of time series classification models by focusing on this specific task. It extends previous applications of machine learning to FSCV task by using a much larger data set and by incorporating recent advancements in deep neural networks. The InceptionTime architecture, a deep convolutional neural network, has the best absolute predictive performance of the models tested but was more susceptible to noise. A naive multilayer perceptron architecture had the second lowest prediction error and was less affected by the artificial noise, suggesting that convolutions may not be as important for this task as one might suspect.
translated by 谷歌翻译
Graph learning problems are typically approached by focusing on learning the topology of a single graph when signals from all nodes are available. However, many contemporary setups involve multiple related networks and, moreover, it is often the case that only a subset of nodes is observed while the rest remain hidden. Motivated by this, we propose a joint graph learning method that takes into account the presence of hidden (latent) variables. Intuitively, the presence of the hidden nodes renders the inference task ill-posed and challenging to solve, so we overcome this detrimental influence by harnessing the similarity of the estimated graphs. To that end, we assume that the observed signals are drawn from a Gaussian Markov random field with latent variables and we carefully model the graph similarity among hidden (latent) nodes. Then, we exploit the structure resulting from the previous considerations to propose a convex optimization problem that solves the joint graph learning task by providing a regularized maximum likelihood estimator. Finally, we compare the proposed algorithm with different baselines and evaluate its performance over synthetic and real-world graphs.
translated by 谷歌翻译